20 research outputs found

    Improving the Cross-Lingual Generalisation in Visual Question Answering

    Full text link
    While several benefits were realized for multilingual vision-language pretrained models, recent benchmarks across various tasks and languages showed poor cross-lingual generalisation when multilingually pre-trained vision-language models are applied to non-English data, with a large gap between (supervised) English performance and (zero-shot) cross-lingual transfer. In this work, we explore the poor performance of these models on a zero-shot cross-lingual visual question answering (VQA) task, where models are fine-tuned on English visual-question data and evaluated on 7 typologically diverse languages. We improve cross-lingual transfer with three strategies: (1) we introduce a linguistic prior objective to augment the cross-entropy loss with a similarity-based loss to guide the model during training, (2) we learn a task-specific subnetwork that improves cross-lingual generalisation and reduces variance without model modification, (3) we augment training examples using synthetic code-mixing to promote alignment of embeddings between source and target languages. Our experiments on xGQA using the pretrained multilingual multimodal transformers UC2 and M3P demonstrate the consistent effectiveness of the proposed fine-tuning strategy for 7 languages, outperforming existing transfer methods with sparse models. Code and data to reproduce our findings are publicly available

    Improving the Cross-Lingual Generalisation in Visual Question Answering

    Full text link
    While several benefits were realized for multilingual vision-language pretrained models, recent benchmarks across various tasks and languages showed poor cross-lingual generalisation when multilingually pre-trained vision-language models are applied to non-English data, with a large gap between (supervised) English performance and (zero-shot) cross-lingual transfer. In this work, we explore the poor performance of these models on a zero-shot cross-lingual visual question answering (VQA) task, where models are fine-tuned on English visual-question data and evaluated on 7 typologically diverse languages. We improve cross-lingual transfer with three strategies: (1) we introduce a linguistic prior objective to augment the cross-entropy loss with a similarity-based loss to guide the model during training, (2) we learn a task-specific subnetwork that improves cross-lingual generalisation and reduces variance without model modification, (3) we augment training examples using synthetic code-mixing to promote alignment of embeddings between source and target languages. Our experiments on xGQA using the pretrained multilingual multimodal transformers UC2 and M3P demonstrate the consistent effectiveness of the proposed fine-tuning strategy for 7 languages, outperforming existing transfer methods with sparse models. Code and data to reproduce our findings are publicly available

    Zero-Shot Cross-Lingual Transfer with Meta Learning

    Full text link
    Learning what to share between tasks has been a topic of great importance recently, as strategic sharing of knowledge has been shown to improve downstream task performance. This is particularly important for multilingual applications, as most languages in the world are under-resourced. Here, we consider the setting of training models on multiple different languages at the same time, when little or no data is available for languages other than English. We show that this challenging setup can be approached using meta-learning, where, in addition to training a source language model, another model learns to select which training instances are the most beneficial to the first. We experiment using standard supervised, zero-shot cross-lingual, as well as few-shot cross-lingual settings for different natural language understanding tasks (natural language inference, question answering). Our extensive experimental setup demonstrates the consistent effectiveness of meta-learning for a total of 15 languages. We improve upon the state-of-the-art for zero-shot and few-shot NLI (on MultiNLI and XNLI) and QA (on the MLQA dataset). A comprehensive error analysis indicates that the correlation of typological features between languages can partly explain when parameter sharing learned via meta-learning is beneficial.Comment: Accepted as long paper in EMNLP2020 main conferenc

    Boosting Radiology Report Generation by Infusing Comparison Prior

    Full text link
    Recent transformer-based models have made significant strides in generating radiology reports from chest X-ray images. However, a prominent challenge remains: these models often lack prior knowledge, resulting in the generation of synthetic reports that mistakenly reference non-existent prior exams. This discrepancy can be attributed to a knowledge gap between radiologists and the generation models. While radiologists possess patient-specific prior information, the models solely receive X-ray images at a specific time point. To tackle this issue, we propose a novel approach that leverages a rule-based labeler to extract comparison prior information from radiology reports. This extracted comparison prior is then seamlessly integrated into state-of-the-art transformer-based models, enabling them to produce more realistic and comprehensive reports. Our method is evaluated on English report datasets, such as IU X-ray and MIMIC-CXR. The results demonstrate that our approach surpasses baseline models in terms of natural language generation metrics. Notably, our model generates reports that are free from false references to non-existent prior exams, setting it apart from previous models. By addressing this limitation, our approach represents a significant step towards bridging the gap between radiologists and generation models in the domain of medical report generation.Comment: Accepted at ACL 2023, BioNLP Worksho

    ProVoc : une ontologie pour décrire des produits sur le Web

    Get PDF
    National audienceDe nombreuses recherches ont depuis longtemps motivé l'utilisation d'ontologies pour répondre aux besoins de représentation du e-Commerce. Dans cet article, nous présentons ProVoc (Product Vocabulary), une ontologie ayant pour objectif de décrire des produits sur le Web. Complémentaire à GoodRelations (Hepp, 2008), l'ontologie au format du Web sémantique la plus utilisée dans le monde du e-Commerce, Provoc se concentre sur une représentation fine des produits et de leurs entités relatives (gammes des produits, composition des produits, etc.). L'utilisation conjointe des deux ontologies permet d'élargir l'espace des requêtes de l'utilisateur. Par exemple : « Quels sont les produits qui contiennent des ingrédients néfastes pour la santé ? Qui les vend ? ». Nous montrons par le biais de requêtes SPARQL que nos scénarios trouvent une formulation adéquate et une représentation pertinente avec ProVoc. Enfin, une application de veille stratégique dans le domaine de la cosmétique est présentée

    Vision transformer assisting rheumatologists in screening for capillaroscopy changes in systemic sclerosis: an artificial intelligence model.

    Get PDF
    OBJECTIVES The first objective of this study was to implement and assess the performance and reliability of a vision transformer (ViT)-based deep-learning model, an 'off-the-shelf' artificial intelligence solution, for identifying distinct signs of microangiopathy in nailfold capilloroscopy (NFC) images of patients with SSc. The second objective was to compare the ViT's analysis performance with that of practising rheumatologists. METHODS NFC images of patients prospectively enrolled in our European Scleroderma Trials and Research group (EUSTAR) and Very Early Diagnosis of Systemic Sclerosis (VEDOSS) local registries were used. The primary outcome investigated was the ViT's classification performance for identifying disease-associated changes (enlarged capillaries, giant capillaries, capillary loss, microhaemorrhages) and the presence of the scleroderma pattern in these images using a cross-fold validation setting. The secondary outcome involved a comparison of the ViT's performance vs that of rheumatologists on a reliability set, consisting of a subset of 464 NFC images with majority vote-derived ground-truth labels. RESULTS We analysed 17 126 NFC images derived from 234 EUSTAR and 55 VEDOSS patients. The ViT had good performance in identifying the various microangiopathic changes in capillaries by NFC [area under the curve (AUC) from 81.8% to 84.5%]. In the reliability set, the rheumatologists reached a higher average accuracy, as well as a better trade-off between sensitivity and specificity compared with the ViT. However, the annotators' performance was variable, and one out of four rheumatologists showed equal or lower classification measures compared with the ViT. CONCLUSIONS The ViT is a modern, well-performing and readily available tool for assessing patterns of microangiopathy on NFC images, and it may assist rheumatologists in generating consistent and high-quality NFC reports; however, the final diagnosis of a scleroderma pattern in any individual case needs the judgement of an experienced observer

    Vision transformer assisting rheumatologists in screening for capillaroscopy changes in systemic sclerosis: an artificial intelligence model

    Full text link
    OBJECTIVES: The first objective of this study was to implement and assess the performance and reliability of a vision transformer (ViT)-based deep-learning model, an 'off-the-shelf' artificial intelligence solution, for identifying distinct signs of microangiopathy in nailfold capilloroscopy (NFC) images of patients with SSc. The second objective was to compare the ViT's analysis performance with that of practising rheumatologists. METHODS: NFC images of patients prospectively enrolled in our European Scleroderma Trials and Research group (EUSTAR) and Very Early Diagnosis of Systemic Sclerosis (VEDOSS) local registries were used. The primary outcome investigated was the ViT's classification performance for identifying disease-associated changes (enlarged capillaries, giant capillaries, capillary loss, microhaemorrhages) and the presence of the scleroderma pattern in these images using a cross-fold validation setting. The secondary outcome involved a comparison of the ViT's performance vs that of rheumatologists on a reliability set, consisting of a subset of 464 NFC images with majority vote-derived ground-truth labels. RESULTS: We analysed 17 126 NFC images derived from 234 EUSTAR and 55 VEDOSS patients. The ViT had good performance in identifying the various microangiopathic changes in capillaries by NFC [area under the curve (AUC) from 81.8% to 84.5%]. In the reliability set, the rheumatologists reached a higher average accuracy, as well as a better trade-off between sensitivity and specificity compared with the ViT. However, the annotators' performance was variable, and one out of four rheumatologists showed equal or lower classification measures compared with the ViT. CONCLUSIONS: The ViT is a modern, well-performing and readily available tool for assessing patterns of microangiopathy on NFC images, and it may assist rheumatologists in generating consistent and high-quality NFC reports; however, the final diagnosis of a scleroderma pattern in any individual case needs the judgement of an experienced observer
    corecore